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Abstract 

Even after over two decades, the total variation (TV) remains one of the most popular reg¬ 
ularizations for image processing problems and has sparked a tremendous amount of research, 
particularly to move from scalar to vector-valued functions. In this paper, we consider the gra¬ 
dient of a color image as a three dimensional matrix or tensor with dimensions corresponding to 
the spatial extend, the differences to other pixels, and the spectral channels. The smoothness of 
this tensor is then measured by taking different norms along the different dimensions. Depend¬ 
ing on the type of these norms one obtains very different properties of the regularization, leading 
to novel models for color images. We call this class of regularizations collaborative total varia¬ 
tion (CTV). On the theoretical side, we characterize the dual norm, the subdifferential and the 
proximal mapping of the proposed regularizers. We further prove, with the help of the general¬ 
ized concept of singular vectors, that an £°° channel coupling makes the most prior assumptions 
and has the greatest potential to reduce color artifacts. Our practical contributions consist 
of an extensive experimental section where we compare the performance of a large number of 
collaborative TV methods for inverse problems like denoising, deblurring and inpainting. 


1 Introduction 

Many problems in image processing require the choice of a good prior that makes assumptions on 
the structure of the underlying image we seek to estimate. This prior often takes the form of a 
regularization term for an energy functional which is to be minimized. Observing that quadratic 
regularization did not allow recovering sharp discontinuities, Rudin, Osher and Fatemi proposed 
the total variation (TV) penalty [46] for solving inverse problems. The total variation pioneered 
as a discontinuity-preserving regularizer in the sense that it assigns the same energy cost to sharp 
and smooth transitions. Therefore, it is one of the simplest (convex) variational models that allows 
discontinuities, yet it disfavours the solution to have oscillations. 

Although the TV was originally designed for image denoising, it has become one of the most 
popular regularizations for many image processing problems and has sparked a tremendous amount 
of research. While many extensions like anisotropic TV [171 [251 Ii5] , weighted TV [HI [221 |2S] , higher 
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order TV [H O IHl [Ml El] , nonlocal TV [HI [Ml Ell SOI SIl ) or nonconvex TV [Ml EZ] have been 
proposed, the general idea of penalizing image oscillations with one-homogeneous functions depend¬ 
ing on the spatial derivatives of the image remain the same. A lot of recent research has focused 
on extending the classical TV model for grayscale images to vector-valued (color or multichannel) 
images. We provide below an initial overview on vectorial total variation, which will be detailed and 
link to our framework in Section jS] 

1.1 Vector Valued Total Variation 

Let 0 C be a bounded domain, then the scalar total variation of a locally integrable function 
u G (n, K) is 

TV(u) := sup ly u(a::) div (^(a;)) dx j-, (1) 

where x = (xi,..., xm) G 0 and 

S = G : ||e(a:)|| < l,Vx G 1!} (2) 

is the set of continuously differentiable and bounded functions with compact support in LI. The 
definition given in 0 introduces a dual formulation according to which the TV is the convex 
conjugate of the indicator function of the convex set ATtv := {div(^) : ^ G S}. For a differentiable 
function u G C^(r2,IR), one has TV(u) = |Vu(a;)| dx. Note that the TV can be defined differently 

depending on the norm used in ([^. For a better understanding, let us restrict ourselves to u G 
C^(n,K) and denote its gradient by \^u{x) = {dxiu{x ),..., dxMu{x)) G at each a; G 12. Therefore, 

using 11-112 as dual norm leads to the isotropic TV, \j^m dx, whereas the anisotropic 

TV follows from choosing |j • ||oo in 

The idea of the vectorial total variation is to extend the above definitions to vector-valued func¬ 
tions u : 12 —>■ A major decision with color images is how to couple channels. A straightforward 

approach proposed by Blomgren and Chan |4] consists in using a global channel coupling by pe¬ 
nalizing the norm of the TV contributions across channels. Flowever, local coupling outperforms 
global coupling in many theoretical and practical aspects [Mj- In this setting, most of the methods 
in color image reconstruction used an C or norm to penalize the TV of the channels at each pixel 
[DEKig. Additionally, some interesting approaches incorporated a change of color space urn- 
Further versions of vectorial TV in literature are based on the singular values of the submatrices one 
obtains by fixing a pixel location and looking at the remaining matrix in the channel and derivative 
dimensions. Important cases are the Schatten—oo norm |23j . which penalizes the largest singular 
value, and the nuclear norm or Schatten—1 norm [34] . which is a convex relaxation of minimizing 
the rank of the image Jacobian at each pixel [42] . 

1.2 Problem Formulation 

For the sake of simplicity, we will consider discrete versions of the TV for the remainder of this 
paper. Let us define the Euclidean spaces X := and Y := ^nxMxC^ where N is the number 

of pixels of the image, M is the number of directional derivatives, and C is the number of color 
channels. We thus consider a color image as a two-dimensional matrix of size N x C denoted by 
u = (ui,..., uc) G X, with Ufc = {ui^k, ■ ■ ■, UN,k)^ G for each channel k G {!,..., C}. On the 
other hand, we define the linear operator K : X ^ Y such that Ku G V is a three-dimensional matrix 
or tensor. In the rest of the paper, we use the colon to denote all elements along one dimension. For 
example, the norm of A G V with respect to its third dimension reads := J2k \^i,j,k\^- 
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The general problem we are concerned with is 


minG(u) + \\Ku\\^^^ 


(3) 


where G : X —>■ K is a proper, convex, l.s.c. functional and || • |j ^ is a collaborative sparsity enforcing 
norm penalizing the gradient of the color image to be detailed later. 

In this paper, we propose a general and intuitive framework that allows us not only to handle pre¬ 
existing vectorial total variation models, but also to introduce some new interesting regularizations 
for color image processing. Our idea is that, in a discrete setting, the gradient of a vector-valued 
image is nothing but a three dimensional matrix or tensor with the dimensions corresponding to the 
spatial extend, the directional derivatives considered as linear operators containing the differences 
to other pixels, and the color channels. The energy or smoothness of this tensor can be measured by 
taking different norms along the different dimensions. Depending on the types of norms one obtains 
very different properties of the regularization. 

Two relevant examples immediately arise from the proposed framework. For the sake of clarity, 
let us write A := Ku G Y. If we first take the P norm to the color dimension, then the O norm 
along the derivative dimension of the remaining 2D matrix and, finally, the C norm to the final pixel 
vector, one obtains the I'P'?’’' norm: 


N M / c 


g/p\ ON 1 /’’ 


IE E 

\ j = l \k=l 


(4) 


In 0, any of the indices p, q or r being equal to infinity means taking the maximum of the absolute 
values along the corresponding dimension. A second important example consists of penalizing with 
the IP norm the singular values of the 2D matrices arising from each pixel (that is, the Schatten—p 
norm), and then applying the O norm along the remaining vector: 


{s^,n{A) 


' N 

E 


Ai^M.l 



(5) 


As an illustrative example, Figure shows the results of a numerical experiment regarding the 
ability of different channel couplings to suppress color artifacts. We use a synthetic image where we 
leave open if the colored wave pattern is signal content or noise. We see that the channel-by-channel 
regularization due to the norm eliminates all noise from constant regions but the color structure 
of the underlying image is not touched. On the contrary, the £°° norm leads to the strongest channel 
coupling and is able to remove the color oscillations completely. In between both, the £'^ channel 
coupling significantly reduces the colors around the white square but does not eliminate them. 
Therefore, we expect a color coupling with an £‘p norm to be stronger the larger p is. 


1.3 Contributions and Preliminary Works 

We streamline below the novelty of our approach. The major contributions of this work are: 

• The introduction of a large family of (discrete) convex energy functionals that generalize the 
TV to vector-valued images. Motivated by recent advances in compressed sensing, we interpret 
the total variation as looking for an image for which the gradient is sparse. We use collaborative 
sparsity [52] to model different types of TV which are then used in a variational formulation 
to provide regularized solutions of ill-posed inverse problems in color imaging. We call this 
family of regularizers Collaborative Total Variation (CTV). 
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Noisy coupling coupling i°° coupling 


Figure 1: Denoising a synthetic image where we leave open if the colored wave pattern is signal 
content or noise. One observes that uncoupling channels (C norm) keeps the colored waves but the 
strongest channel coupling norm) eliminates them. In between both approaches, the channel 
coupling suppresses but does not eliminate the wave pattern. 


• The definition of general collaborative sparsity enforcing norms that characterize all CTV 
regularizations. We further compute their dual norms and their subdifferentials, which play a 
direct role in computing optimality conditions of several regularized problems. 

• The proof, with the help of the generalized concept of singular vectors [3] , that an £°° channel 
coupling leads to the strongest correlation, makes the most prior assumptions and has the 
greatest potential to reduce color artifacts. 

• The proposal of sophisticated collaborative norms such as £oo,oo,i^ 

which lead to novel methods for color images. All variants can be solved very efficiently by 
using the same splitting scheme, for instance, the primal-dual hybrid gradient (PDHG) method 
[a [miss]. Since the key to obtaining a fast PDHG algorithm is an efficient evaluation of the 
proximity operators, they are provided in detail. 

• An extensive experimental evaluation of some of the proposed GTV methods on several image 
processing problems, such as denoising, deblurring or inpainting. A detailed performance 
comparison on different databases for color image denoising using the ROF model together 
with the proposed collaborative TV regularizations is provided in the companion paper |15) . 
We further include some experiments for cartoon and texture decomposition. Gode and an 
online demo to reproduce all examples will be made available soon. 

In the original conference paper |14j , which contains preliminary parts of this work, we proposed 
to penalize the norm of the three-dimensional structure underlying the nonlocal gradient for 

color image reconstruction. In particular, the newly proposed NLTV model yielded superior 

results. In the current paper, we extend the original framework in order to include more general 
collaborative norms: we propose novel norms and further incorporate Schatten (S'p,£‘^) norms. 
We also provide a mathematical justification of the superiority of the £°° coupling for restoring high 
inter-channel correlated images, as well we develop general properties for collaborative norms useful 
in optimization. Finally, we give a detailed performance comparison of more vectorial TV methods 
derived from the proposed framework in additional image processing problems. 

During the wording of this work, the conference paper by Miyata and Sakai [35] . which pioneered 
the £°° channel coupling, came to our hands. To the best of our knowledge, [5S] is the only paper 
that uses the supremum norm for vectorial TV. However, the authors proposed to first perform a 
color transform that reduces the inter-channel correlation. From our point of view, this change of 
color space is counter-intuitive when combined with the strong inter-channel coupling of £°°. One of 
our main contributions is to introduce the £°° norm in a straightforward way and efficiently exploit 
its properties. 
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1.4 Outline of the Paper 

The rest of the paper is organized as follows. The next section introduces the definition of collabora¬ 
tive norms and develops some general properties which play a direct role when computing optimality 
conditions of regularized problems. In Section we summarize the current literature on different 
definitions for extending the TV to multichannel images. All of them are analyzed as special cases 
of the proposed approach. We investigate in Section]^ which channel coupling leads to the strongest 
correlation, makes the most prior assumptions and has the greatest potential to reduce color arti¬ 
facts. In Section we give detailed explanations on how to determine minimizers of typical image 
processing problems using CTV as a prior. Particularly, we write down the proximity operators for 
all types of regularizations discussed in this paper. We compare different CTV methods in numerical 
experiments for denoising, deblurring and inpainting of color images in Section before we draw 
conclusions in Section [3 

2 Collaborative Total Variation Regularization 

In the following, we introduce a novel regularization family which we use to solve inverse problems 
in vector-valued image processing within a variational setting. The proposed models are based on 
the use of collaborative sparsity enforcing norms, which will be abbreviated as collaborative norms, 
that are defined below. 

2.1 Definition of Collaborative Norms 

By considering the derivatives of a color image as a linear operator, one obtains a three-dimensional 
matrix or tensor with one dimension corresponding to the pixels in the image, one dimension corre¬ 
sponding to the directional derivatives, and one dimension corresponding to the color channels. 

Example 1. For illustrative purposes, suppose that a color image given on a rectangular domain 
of size Vuj X Nfi has been rearranged from left to right and from top to bottom into a matrix u = 
(ui,U 2 ,U 3 ) G Consider K to be the standard gradient computed via forward differences along 

X— and y—directions. Then, the two-dimensional submatrix obtained by fixing the n—th pixel in the 
first dimension is 

I Un+l.l — Un,l Un+1,2 — Mn,2 Un+1,3 — Un,3 
\ '^n+NwA ^n,l ^n-t-Vu,,2 '^n,2 ^n,3 

Example 2. Let us see how a neighbourhood filter fits in our framework for a color image with 
four pixels. Let K be defined as the nonlocal gradient with respect to a weighting function w, which 
measures the similarity between two pixels in the image. In this case, we have N = 4, M = 4, and 
(7 = 3. Contrary to the previous example, we fix here the color dimension to the k—th channel. 
Therefore, the remaining two-dimensional submatrix along pixel and derivative dimensions is 


( ^ 

Wl,2 iui,k - U2,k) 

Wl,3 {Ul,k - U3,k) 

Wl,4 {Ul,k — U4,k) 

\ 

W2,l (U2,k - Ml.fe) 

0 

W2,3 {U2,k - U3,fc) 

W2,4 {U2,k — U4,k) 


^3,1 {U3,k - Ml.fe) 

W3,2 iu3,k - U2,k) 

0 

^3,4 {U3,k — U4,k) 


\ Wip {U4,k - Uip) 

Wp2 (uAp — U2,k) 

■^^4.3 (U4,k — U3,k) 

0 

/ 


In general, the previous matrix is of size N x N. However, one usually uses a few nonzero weights 
in practical applications. 

Although in the literature only C and norms have been mainly used so far, it makes sense 
to look at vectorial TV as applying the more and more popular mixed norms to the gradient of 
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the image (see [iHiiiTiin] and references therein). For a general tensor A & Y, we introduce the 
following family of norms that we call collaborative norms. 

Definition 1. Let || • ||a : —)■ M fee any vector norm and || • ||g : —)• K any matrix norm. 

Then, the collaborative norm of A G ]^nxMxC^ which will be denoted by || • 
defined as 

— II^IU- = P*.:.;ll6, Vf e iV}, (6) 

where Ai ,^, is the (two-dimensional) submatrix obtained by staking the second and third dimensions 
of A at each i—th position in the first dimension. 

We note that the examples given in Q and © follow from the above definition. Indeed, the 
£p,q,r norm arises from taking || • ||jj as the matrix £P’‘^ norm and || • ||a as the norm. On the 
contrary, the {S'p,£‘^) norm is obtained when one considers || • |jg to be the matrix Schatten-p norm, 
that is penalizing the norm of the singular values of the submatrix Ai^,^,, and || • ||a, the £'^ norm. 

Since the collaborative norms defined in Q are non invariant to permutations of the dimensions, 
we propose to denote |j A|j ^(co/, der,pix) for first applying the matrix norm || • ||g to the submatrix 
obtained by fixing each pixel and looking at the remaining derivative and channel dimensions, and 
then using the vectorial norm || • ||o along the pixel dimension. Importantly, note that our framework 
covers any transform along each of the dimensions, in particular, allows us to incorporate color space 
transforms before applying any collaborative norm. 

2.2 General Properties of Collaborative Norms 

It is well known that duality plays a direct role in computing optimality conditions of several regu¬ 
larized problems. The following result characterizes the dual norm to any collaborative norm. 

Theorem 1. Let || • |jj;, and || • |ja* denote the dual norms to || • ||jj and || • ||q, respectively. Consider 
A G K^xMxC define v G such that Vi := Hj;. for each i G {1,..., N}. If ||u||a* only 

depends on the absolute values of v[s, then the dual norm to || • ||g^, denoted by \\ ■ \\j^, , is 

ll^llb-.a* = II^IU*’ life., Vf e {!,..., IV}. (7) 

In other words, the dual norm of the composite is the composite of the dual norms. 

Proof. We aim at proving that 

sup {(AS) : < 1 } = 

where || • ||g. is defined in Q. Let B G K^xMxC satisfying ||i?||g. < I be fixed but arbitrary, 

and define v\ := and v\ := for each i G {!,.■. ,IV}. Applying Holder inequality 

for both II • II j and || • ||a norms yields 

N M C N 

i—1 k—1 i—1 

<||u1„||uS*|U.=||A||,-J|H||5._^.<||A||j_^. 

The proof reduces now to show that there exists some B G K-^xMxC satisfying ||7?||g. < 1 for 

which the equality holds. 

Since || • ||o. is the dual norm to || • ||o, there exists some z G , ||2;||a» < 1, such that {v^,z) = 
IIIIa- We can additionally assume that Zi > 0 for all i G {I,..., A^}. Indeed, suppose that Zj < 0 
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for some j € N} and define z with Zi = Zi for i ^ j and Zj = —zj. Since ||z||a* only depends 

on the absolute values of the Zi’s, it follows that 2 meets US'Ha* < 1- If > 0, then one deduces that 

{v^,z) < {v^,z), which contradicts the definition of ||u^||o- If uj = 0, then ||z;^||a = {v^,z) = 2) 

so that we need only to take z instead of z. 

On the other hand, since || • ||g. is the dual norm to || • ||g, there exists y* G < 1, 

such that {Ai^.^.,y^) = || j; for alH G {1,..., A^}. 

Now, it follows from the definitions of z G and each y’’ G that 


N N 

\\AU^^=\\vX = {v\z)=Y,zM^,:,: 

2=1 2=1 


N M C 


= ^z,{A,..,y^) 


1 


2=1 j — 1 k—1 


from where = {A, B) by choosing Bij^^ = Ziyj Furthermore, 

vf = = Wz^y^W^, = |zi| • ||y*||g. < |zi| = Zi, Vi G iV}. 

Let V G K^, ||u||a < I, be fixed but arbitrary, then 



N ^ N 

= < J^ZiVi < ||2||a*|ki|U < 1. 

2=1 2=1 


which implies ||i?||g, = ||r'^*||a* = sup{(z;^*,z;) : v G K^, ||w||a < 1} < 1- This means that we 

found B G < I, such that {A,B) = which concludes the proof. □ 

Theorem [^states that the ’’’ norm is dual to where p*, q*, and r* denote the Holder 

conjugate exponents of p, q, and r, respectively. Similarly, the dual norm of the Schatten (3^,0) 
norm is ,0 ). Furthermore, we have implicitly proved a Holder’s inequality for collaborative 
norms. 

Lemma 1. Under conditions of Theorem^ we have 

|(AS)l<P||5,,-||B|l7.,., 

for any A, B G ^nxMxC_ 

We also furnish ourselves with the subdifferential of the proposed collaborative norms that will 
be useful for computing their proximal mappings. 

Theorem 2. Consider A G K^xMxC define v G such that Vi := ||^q:,:||g for each i G 
{!,...,iV}. //||w||a* only depends on the absolute values ofvi’s, then the subdifferential of\\ is 
given by 

a(||H||gJ = {HGK^x^><^ : ||B||g.,,*<l and {B, A) = \\A\\j;^^} . (8) 

Proof. Since II ’ llj^j is positively one-homogeneous, it is well-known that its subdifferential is given 
by 

5(PIIm) = {^ e : (H,H) = P||,-^ and (H, M) < ||M||,-^ Vm} . 

Recall now that the Legendre-Fenchel transform of a proper convex function is defined as f*{y) '■= 
supj, {{y,x) — fix)}. Furthermore, the Legendre-Fenchel transform of a norm fix) := ||a:|| turns to 
be the indicator function on the unit ball of the dual norm: 


riy) = 


0 if liz/ll* < I, 
- 1-00 otherwise. 
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We refer the reader to [551 US] for more details. Therefore, taking the supremum over all M G 
■^NxMxC {B,AI) < yields f*{B) < 0. Due to Theorem we necessarily have that 

||.B||j, < 1, which ends the proof. □ 


3 Vectorial TV Revisited 

The collaborative norms defined in the previous section support most of pre-existing definitions of 
TV for vector-valued images, the most relevant of which are displayed in Table For nonlocal TV 
based models, we refer the reader to our conference paper El- 


Literature 

Continuous Formulation 

Collaborative TV 

m 

C 

I 

k = 

(dxuAx)V + {dyuAx)V dx 

(der, pix, col) 

Anisotropic 

variant 

, 

C 

V / {\dc,uAx)\ + \dyuAV)\) dx 

{der, pix, col) 

m 

C 

\ k=l 

(/i ^ d.x'^ 

£2,1,2 col) 

Anisotropic 

variant 

4 

2 {\dxUk(x)\ + \dyUk{x)\) dx^ 

{der, pix, col) 

[aiiii 

L 

c 

(^{dxUk{x))^ + {dyUk{x))^^ dx 

\ k=l 

(der, col, pix) 

Anisotropic 

variants 

L{\ 

C 

'^{dxUkix)V + . 

k = l \ 

I dx 

k^l J 

£2,1,1 c^er, pix) 

L 

c 

\ V {\dxUk(x)\ + \dyUk{x)\)'^ dx 
\ k=l 

£1,2,1 qqI^ pix) 

Strong 

coupling 

LL 

max \dxUh(x)\ + max \dvUh{x)\ I dx 

Cfc<C i<fc<C 7 

t°°dG(col,der,pix) 

Jn 

max {\dxUk{x)\ + \dyUk(x)\) dx 
l<k<C 

A’°°d(der, col, pix) 

Isotropic 

variants 


max \dyUk(x)\ ] dx 

\l<k<C J 

£00,2,1 ^gy,^ pix) 

Jn 


£ 2 ,oo 1 pix) 

Supremum 

variant 

L (““ 

{1™ <c 

£00,00,1 dgr, pix) 

[ 21117 ] 


f \ai (Vw( 3 :))| dx 

Jn ^ 

{S^ {col, der),i^ {P'lx)) 

Frobenius 

norm 



{S^{col, der),^^ {v^x)) 

El 117 ] 


/ max cFi (Vw(x)) dx 

Jn l<i<r 

{S°°(col, der), A (pix)) 


Table 1: Overview of local vectorial TV approaches and the way they fit in our framework. 

Approaches to defining vectorial TV regularizations can roughly be divided into two classes. 
The first class of methods extend the definition of the scalar case Q to vector-valued images by 
introducing a suitable channel coupling. The second class of approaches emerges when considering 
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the Riemann geometry of the image manifold. All of them are analyzed below as special cases of 
CTV. We formally use continuous notations even though our framework is given in the discrete 
setting. 


3.1 Vectorial TV Models from Channel Coupling 


The first known extension of the total variation to vector-valued images is due to Blomgren and 
Chan [3] . They applied the Euclidean norm to the vector obtained from the TV contributions across 
channels, that is, 


VTV(u) 


c 


'i 




(9) 


From the Euler-Lagrange equation associated to ([^, one easily observes that there is a global weak 
channel coupling so that the same per-channel weight is used for all pixels. Consequently, this 
model favours the restoration of images for which similar noise is measured in each channel. In our 
framework, the vectorial TV proposed in [I] can be written as an £^'^’^(der,pix,col) penalty. 

Probably, the most simple way to introduce multichannel TV is to sum up the contributions of 
each channel separately [1], which leads to 


C 

VTV(u) :=^TVK). (10) 

k=l 

Depending on the coupling used along the derivative dimension, one obtains the isotropic ver¬ 
sion, i'^’^’^{der,pix,col), which was the one originally proposed in [T], or the anisotropic version, 
£^’^’^{der,pix,col). As pointed out by Goldluecke et al. [23], the drawbacks underlying this ap¬ 
proach are color smearing and edge distortion because of the missing channel coupling. We can 
expect ( |l0| to be a good choice if there is no particular relation between channels. 

In [ofTthe isotropic vectorial TV with a local P channel coupling is proposed: 


VTV(u) := J ^^^(^{d^Ukix)f + idyUkix)f'^ dx, 


( 11 ) 


which is equivalent to the collaborative norm £'^’'^’^{der,col,pix). Blomgren and Chan |3] noted that 
this method actually favours gray-value images over colored ones, which leads to color smearing in 
denoising applications. 

The inclusion of additional color transforms has been proposed to improve the performance of 
vectorial TV methods. It is well known that RGB channels of natural images are highly correlated. In 
view of this, some researchers incorporated different color transforms into the definition of vectorial 
TV and penalized the gradient of each component in the new basis separately mm- 


VTV(u) :=^TV(ufeoV^), 


( 12 ) 


fc=i 


where iIj : R'^ —)■ is an orthonormal transform between color spaces. The key idea is to choose 

ip such that it provides effective reduction of the correlation among channels. Note that (12) is 
equivalent to penalize the collaborative norm i^'^’^(der,pix,ip{col)) for the anisotropic variant and 
{der,pix, 'il>{col)) for the isotropic variant. 
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3.2 Vectorial TV Models from Riemann Geometry 

A color image can be considered as a parametric two-dimensional manifold embedded in a (7—dimensional 
space m- In this framework, the metric tensor of the manifold is analogous to the structure tensor 
of the image, that is, (Vu)^ Vu. Therefore, the eigenvectors of (Vu)^ Vu determine the directions 
of maximal and minimal change and the eigenvalues, which will be respectively denoted by A+ and 
A“, give their rate of change. 

In this setting, Sapiro m introduced the following general vectorial TV model: 


VTV(u) := [ /(A+,A-) 

JY. 


dx. 


(13) 


where E denotes the image manifold and / is a suitable scalar-valued function. In general, (13) is 


defined for differentiable functions, but only for special cases one has dual formulations that extend 
it to locally integrable functions. This is the case of the Frobenius norm of the gradient given by 


VTV(u) := [ ||Vu(a:)||i.da;, 

Jq 


(14) 


which follows from (13) by considering /(A+, A“) = VA+ + A“. Note that (14) is equal to the defi¬ 
nition of the vectorial TV given in (11) and, thus, either col,pix) or {S^{col, der), (pix)) 

can be used in our framework. 

Based on the class of methods presented by Sapiro, Goldluecke et al. [53] showed that the natural 
choice for vectorial TV arising from geometric measure theory is to penalize the largest singular value 
of the Jacobian: 

(15) 


VTV(u) = [ ai{\7u)dx, 
Jn 


where ai is the largest singular value of Vu or, equivalently, the largest eigenvalue of the structure 
tensor (Vu)^ Vu. The regularization introduced in |23| is known as the spectral or Schatten—oo 
norm and fits in our framework as {S°° {col, der), C {pix)). 

Recently, Holt [53] interpreted (EH as a special case of spatially-local coupling models. The 
author proposed to smoothen a differentiable function u by penalizing its Jacobian matrix: 


VTV(u) := [ </)(Ju(x))dx, 
Jn 


(16) 


where Ju : H —>• denotes the Jacobian matrix of u, so [Ju(a:)]^- ^ = ^Uk{x). This Jacobian 

framework is closely related to (13), since the structure tensor is given by Ju{x)Ju {x) at each point 


in the image. Note that ( jlOj ) and (11) are special cases of (16), however, any method using spatially- 
global coupling such as piT is not covered by Holt’s approach. In |29|, the author considered only 


functions that are written in terms of the singular values of J^. Therefore, the Frobenius norm (14) 
follows from (j) := \/A+ -I- A“, the spectral norm (15) follows from (j) := 'J A+, and the nuclear norm 
[34| follows from (ji := •\/A+ -I- -s/A”. In our framework, the regularizations arising from (16) are given 
by {SP{col,der),i^{pix)). 

Another relevant approach based on Riemann geometry was pioneered in the framework by 
Kimmel, Malladi, and Sochen [301115]) who considered the graph of an image embedded in a (C -I- 
2)—dimensional space and proposed an area minimizing flow. This class of regularizations lead 
to diffusion equations with the direction given by the Beltrami flow. Roussos and Maragos [H] 
generalized the Beltrami flow by using higher dimensional mappings which depend on image patches: 


VTV(u) := [ tP{X+,X-) dx, 

Jn 


( 17 ) 
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where ip is increasing with respect to both arguments, and and are the larger and smaller 
eigenvalues of the structure tensor * (Vu)^Vu, with Kj. being a non-negative, rotationally 
symmetric convolution kernel. In posterior works [331134] , a deeper analysis for the particular choice 
V'(A+,A“) = II (\/^) \/^) lip was developed. There, the tensor TV that arises when p = 1 is 
renamed as the nuclear norm, which can be written as {col, der), C (pix)) in our case. In order 
to incorporate information from the vicinity of every point in the image domain as in we 

only have to incorporate the nonlocal gradient operator (see [14] for more details) and penalize the 
resulting structure with the help of the {S^{col, der), C (pix)) norm. Contrary to our approach, 
neither spatially-global coupling norms like ([^ nor TV with channel coupling are covered by 

(dll)- 

3.3 Other Vectorial TV Models 

For the sake of completeness, we should also mention that there exist several further TV variants, 
such as nonconvex regularizations based on norms with p < 1 [32] , and nonconvex extensions for 
minimizing the rank of submatrices in a TGV framework m- Additional work has been done on 
improving TV with the help of Bregman iteration [361138] . The study of the previously mentioned 
classes of methods, however, goes beyond the scope of this paper. 

4 Which Channel Coupling Disfavours Color Artifacts? 

For discussing the question which CTV methods work better, we have to understand what kind of 
properties they try to impose on the reconstructed image. In this section, we analyze the differences 
between a color coupling in the £^, P' and fashion with the help of the generalized concept 
of singular vectors [3j. The question whether a strong or a weak coupling leads to better results 
depends on the type of correlation in the data. Our investigation explains why the norm leads 
to the strongest relation, makes the most prior assumptions and has the greatest potential to reduce 
color artifacts. 

Benning and Burger developed in [3] a generalization of the concept of singular vectors and 
singular values for arbitrary convex regularizations, and showed that a signal can be restored partic¬ 
ularly well if it is a singular vector to the regularization for which is used. The authors also showed 
that, even in the case of noisy data, an exact reconstruction (up to a loss of contrast) is possible 
under certain conditions. In this sense, they provided a theoretical basis for explaining that TV 
regularization works particularly well for piecewise constant images. 

In order to analyze the behaviour of collaborative norms for vectorial TV, we restrict our¬ 
selves to the case of image denoising modelled by anisotropic vectorial TV, namely comparing 
£^’^I{col,der,pix), £‘^-^’^{col,der,pix) and £°°'^'^{col,der,pix) norms. Let D denote the usual local 
discrete gradient operator such that Hu G V, and consider only the energy due to the regularization, 
that is, J(u) := F{Du) = ||Hu||g’^. We fix M = 2 since only x— and j/—derivatives are considered. 
Furthermore, let {x,y) denote any pixel of the image, with x being the row and y the column in 
the rectangular domain. We provide below the definitions of singular value and singular vector for 
image denoising problems. 

Definition 2. Let J be a convex functional with dJ{u) iP at every u G domJ. Then, every 
function uv G A satisfying |iuA|| = 1 and Auv G dJ{u\) is called a singular vector of J with 
corresponding singular value A. 

In the case of J being one-homogeneous we even have that Aua G dJ{u\) is equivalent to 
A = J(ua), which easily follows from Euler’s identity [SO]: (v,u) = J(u) for any v G dJ{u). 
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Note that, for any u S dJ{u), one can define A := a/J(u) = ||u|| and ua := yu such that ua 
is a singular vector. We will therefore omit A and focus on the construction of u G X satisfying 
u G dJ{u). Since J(u) = ||iAu||j^, the latter condition is met if u can be written as u = D"'"z for 
some z G 9Du(||.Du||g^) which, by applying Theoremis equivalent to 

{z,Du) = ||T>u||g^^ and ||z|lg._^. < 1. (18) 


In what follows, all mathematical proofs have been moved to Appendix We aim at finding 
some z G T satisfying (18) to determine u = D^z G dJ{u). Motivated by [3], it makes sense to 
consider piecewise linear funtions whose changes happen only at { —1,+1}. More specihcally, we 
choose 

4ix, y) = clllix) and zl{x, y) = cllHy), Vfc G {1,..., C}, (19) 

for some cj) G M, and having the following properties: \ll{x)\ < 1 for all x, l^. piecewise linear, and 
the linearity changing at x only if \ll{x)\ = 1. The details for why these functions have to look like 
this are left for the proof in Appendix]^ We simply illustrate examples for and z^ in Figure]^ 
It is remarkable that singular vectors to the CTV methods under consideration can all be written 


in the form of (19) and only differ in two aspects. First, the C case allows different 1], for different 
color channels, while the P' and £°° norms do not. Second, the coefficients are different for each 
regularization. 




0 0 


Illustration of z]. Illustration of z1 

Figure 2: Examples for functions z], and zf.. As we can see, they are constant in one direction, 
piecewise linear in the other, and the points where the piecewise linearity changes are at { — 1, +!}■ 


Table shows the precise construction of singular vectors. The results displayed there meet 
what we would expect based on the regularization behavior of the different methods. For the C 
case, each channel can have its own such that jumps can be at different positions in the different 
channels. Since no relation on the positions of the jumps in different channels is imposed, we can 
expect the norm to not suppress color artifacts and not change the position of the edges. This 
is a theoretical explanation for what we saw in Figure Both and couplings require the 
to be independent of fc, that is, jumps in different color channels are encouraged to be at the same 
position. The difference between them is that the size of the jumps, corresponding to the coefficients 
c^, are allowed to be arbitrary in the case, while they have to be either zero or of equal magnitude 
in the (°° norm. Equal magnitude of the jumps in all three color channels leads to a grayscale image. 
This tells us that the regularization based on opposed to encourages jumps that occur 
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in all three channels to only change the intensity but not the color of the image. Looking at the 
results in Figure we can see again that the singular vector analysis confirms exactly what we 
observed in practice. 


Regularization 

Singular Vectors 

Properties 


Uki^,y) = -4 DOiix) - cl Dyll(y) 

4 e {o,±i} 

II-Du||2,i,i 

Ukix,y) = -cl D^d(x) - cl Dyfi{y) 

The piecewise linear functions Z’’ 
do not depend on k, ||c ^||2 = 1 

l|-Du||oo,l,l 

Uk{x,y) = -cl Dj:d{x) - cl Dyl'^iy) 

The piecewise linear functions Z’’ 
do not depend on k, G {0, ±1} 


Table 2: Comparison of singular vectors for coupling the color channels in an C, and fashion. In 
this setting, and Dy denote the differences in the horizontal and vertical directions, respectively. 

For illustration purposes. Figure shows some examples of singular vectors. Depending on 
the type of jumps in the data, that is, jumps in different color channels being independent of one 
another, jumps being at the same position but changing the color, or jumps being at the same 
position and likely not changing the color, the the or the norms will show a 

superior performance. Interestingly, our numerical results in Section [^indicate that a suppression of 
color artifacts by using is more important than making weaker and more general assumptions 

on the types of jumps in natural images. 


5 Numerical Minimization 


It is remarkable that all variants of different norms imposed on the three-dimensional structure can 
be solved very efficiently by using splitting techniques. The only thing that changes when changing 
the regularization is the proximity operator, which is discussed below. 

Recall that the proximity operator of a proper, convex, and l.s.c. function / is 


prox.,.i(a;) = argmin 

J 1/ 


\y-xf + Tf{y) 


( 20 ) 


where a > 0 is a scalar parameter. Furthermore, Moreau’s identity connects the proximity operator 
and its Legendre-Fenchel transform in the following way: 

X = pTOx^j{x) + rproxi ^ — J . (21) 


5.1 Proximal Map of CTV Regularizers 

Theorem allows us to write the optimality condition to (201 as 

I=prox^l|.|l (A) \\A-A\\r , <T and (A, A) = t\\A\0 + \\A\\l, 


( 22 ) 


for any A^Y. In this setting, 11-112 denotes the Euclidean norm applied to the vectorial structure 
obtained by rearranging the original three-dimensional matrix into a vector. When it is not possible 
to obtain an explicit solution from (221, one usually invokes duality through Moureau’s identity 


A = A — proj: 




(23) 
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Image generated from 
which are different for all 
fc, resulting in an im¬ 
age where all three color 
channels have jumps at 
different positions. Up to 
a scaling, this is a singu¬ 
lar vector to but not 
to or 



Image with four instead of 
two jumps in each chan¬ 
nel and in each direction. 
Again, the edge sets of 
the different color chan¬ 
nels are different. Up to 
a scaling, this is a singu¬ 
lar vector to but not 
to or . 


Image for which the 
jumps are at the same 
positions (all z], are 
equal), but at which 
the coefficients are 
[0.54,0.2,-0.82]. Up to a 
scaling, this is a singular 
vector to not to 

^1,1,1 Qj. £00.1,1^ 



Image with four jumps in 
each direction, jumps in 
the different color chan¬ 
nels being aligned, and 
color channel coefficients 
being [0.2,-0.59,-0.78]. 
Up to a scaling, this is 
a singular vector to 
but not to or 


Image with jumps at the 
same positions and with 
all coefficients being equal 
to one. Up to a scaling, 
this is a singular vector to 



Image with four jumps in 
each direction, jumps in 
the different color chan¬ 
nels being aligned, and 
color channel coefficients 
being [I,—I,—I]. Up to 
a scaling, this is a singu¬ 
lar vector to 
and 


Figure 3: Illustrating singular vectors to vectorial TV regularization using collaborative norms. 
While the true singular vectors have zero mean, the above images have been rescaled to lie between 
zero and one for visualization purposes. 
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where proji||.|i denotes the projection operator onto the dual ball of radius r. 


Example 3. We now display the proximal mappings of the regularizations based on norms 

which will be used in the experimental section. 

• The proximity operator of the norm decouples in all variables and each problem just 

contains an absolute value penalty: 


^ =max(|>lij,fc| -r,0) sign{Aij^k)- 


By a short computation, one obtains the proximal mapping of the norm as the (general¬ 
ized) shrinkage: 


[prox^\\.\\2,i,i ^ II 2 - r, 0) 

as well the proximal mapping of the norm: 


A. 


i,j,k 


IIA..:II2’ 




• Whenever the supremum norm is involved, it is more convenient to use (23) to express the 
proximity operator by the proximity operator of its dual. For the norm, one has 

= A,^j^k-Tsign{Aij^k) j 

where denotes the eomponent-wise absolute value of veetor Aij, and proj||.||^<]^, the 

projection onto the unit C norm ball. Similarly, we obtain the proximity operator of the 


^ 00 , 00,1 


norm as 




. . ^ = - TSign (A^j^k)iwoj\\.\\^^<l j 

^ \ \ / / 


with proj||.||^ denoting the projection operator onto the unit ball. Finally, the proximity 
operator of the norm is 


( 


prox^ 


. = A,j^k-Tsign{A,,j^k) (Proj||.||^ ^<1 ) 

V ’ V''' / J i,i,k 


where proj||.||^ denotes the projection operator onto the unit ball. 

Let us now discuss the proximity operator of the norm. For that purpose, we require a 

previous result that states the chain rule for subdifferentials. The proof is outlined in Appendix [B| 


Theorem 3 (Chain Rule of Subdifferentials). Let f : K" —)■ K"* be a vector-valued function such 
that fj : R” —>■ K is proper and convex for each j € {1,..., m}. Let g : M"* —> M 6e convex, proper 
and nondecreasing in each argument. Then, 


dig o f)ixo) 3 < C e M" : £. = Y^ QjVj, 


i=i 


q = {qi,...,qjn) G dg{f{xo)), 
Vj G dfjixo), VI < j < m 


(24) 


at any xq G domigo f). If further xq G intdomigo f) and all fj are locally l.s.c., then the inclusion 
in (24) becomes an equality. 
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Although it seems to be difficult to compute the proximal mapping of the 
glance, Theorem leads to a particularly interesting observation regarding functionals having an 
norm as an inner regularization. The following result will provide us the key for computing this 
class of proximal operators. 

Theorem 4. Let f : K" be defined as 


fi{u) := ||U ,,:||2 = 


Vi e {l,...,n}, Vwe 


and let g : M" —>■ M 6e any proper, convex function that is nondecreasing in each argument. Then, 
the proximity operator of g o f is 


[prox,^g,f){u 


max(||Mi,,||2 - TUi,0), Vi,j, 


hJ \\Ui,-.\\2 

where Vi, i G {1,..., n}, are the components of the vector v G K" given by 


. 1 

V = arff mm - 
2 


W - -f{u) 


+ -g*(w). 

T 


Proof. The optimality condition arising from (20) yields 

prox^(go/) {u) = u- rf, for some ^ G d{g o f)(u). 

Let US define u as 






\\ U ^,■.\\2 


max(||uj,:||2 - Trj,0), Vi,j, 


(25) 

(26) 
(27) 


where Vi, i G {1,... ,n}, are the components of the vector v G M" solving ( [2^ . We aim at proving 
that u satisfies ( [^ . For that purpose, note that m in (271 can be stated as the solution of a weighted 
£ 1,2 regularized problem: 


Ui^, = Ui^: - TViZi, for some Zj G 9 (||wi,:|| 2 ), Vi G {1,..., n}. 


(28) 


In view of (261 and (28), we only need to prove that the matrix with columns ViZi is in d{g o f)(u). 
Due to the chain rule stated in Theorem]^ this follows whenever Zi G dfifid), which is true by 
definition of Zi, and v G dg{f{u)). In order to prove the latter, note that (25) yields the optimality 
condition 

t = fi.u) - TV, (29) 

for some t G dg*{v) or, equivalently, v G dg{t). It is thus sufficient to show that tj = ||ui ,:||2 = fi{u). 
From (29), we see that ti = ||'«i ,:||2 — TVi for each i G {I,... ,n}. Since g is nondecreasing in each 
argument, then its proximity operator is nonnegativity preserving and so fi > 0. Consequently, if 
||m £:||2 = 0, then (27) implies ||■Ui _:||2 < rvi and, thus, ti = 0. Otherwise, it follows that 


\Ui,-.\\2 = 


\ U ^,:\\2 


\Ui,-.\\2 - TVi) 


= Ihz. 

II 2 - TVi\ 

2 



which completes the proof. 


□ 
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Example 4. By Theorem^ the proximity operator of the norm is 

AiJ^k 






max(||ylij,:||2 - TVij,0) , 


where 



In the above formula, (||Aij_:|| 2 )j denotes the vector we obtain by stacking ||Aij ^:||2 for all j G 
{1,... ,M}. Note that the proximal mappings associated to and can also be computed by 

means of Theorem^ 


Finally, the proximal mapping associated to the {Sp,£‘^) norm is a simple combination of a 
singular value decomposition followed by the proximity operator of the corresponding £p norm. 
Since the regularizations considered in this paper, which base on {S^,£^) and {S°°,£^) norms, have 
an outer £^ norm, then the computation of their proximity operators decouples at each pixel. By 
denoting B := Aj. ., we are thus left with a problem of the form 


min 

DgRCXM 



B\\l + t\\D\\sp, 


(30) 


the solution of which is given in the following proposition. 

Proposition 1. Let ITEqV"’" be the singular value decomposition of a matrix B G . Then, 

the proximity operator of the S^—norm is given by 

prox^Sp{B) = BVf,T,lV^, 

where eJ denotes the pseudo-inverse matrix ofEo and S = proa;,-||.||^(diag(Eo)). 

In the following example, we show the proximal mappings of the CTV regularizations using the 
Schatten norms we are interested in. 

Example 5. For A G ^ proximity operator of the {Sp,£^) norm is 

(^proXp^Sp,e^M))^ = AT:.:^i(P™2:^l|-||p(diag(Ei)))EjF7, Vi S {1,.. .,N}, 
where Aj^. . = UiE^V^ is the singular value decomposition of Aj.. G 


5.2 Solving the Minimization Problem 

For solving the optimization problem (§ that arises from the proposed collaborative TV, we use 
the primal-dual hybrid gradient (PDHG) method [3 Uni [HI [53] , a powerful optimization algorithm 
that breaks complex problems into simple sub-steps and can handle non-smoothness of the energy 
functional. 

By introducing an auxiliary variable g G V and the constraint iVu = g in (|^, then we obtain 
the following formulation of the original problem: 

min G(u)-b||g|ij^ subject to iVu = g. 

uGX, gGT ’ 

Now, consider the Lagrangian L(u, g, q) = G(u) -|- HgHg^ -h (q,Nu — g)y, then the associated 
primal-dual problem is 

max min G(u) -b ||g||j^^ -b (q,iVu- g)y. 


(31) 
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The PDHG algorithm for solving ([^ iteratively computes the solution of the associated saddle-point 
problem (31) by means of 


u”+^ = prox^^c.(u" - TnK^q" 

_„+i ^ ^„+i ^ 


77 _|_ 1 

= proxj. 


( Xu"+1 -h 5- 
"+i =q" + a„(ifu"+i-g"+i), 


where n > 0 is the iteration number, and t„,(T„ > 0 are the step-size parameters. The algorithm 
basically consists of alternating a gradient descent in the primal variables u and g, and a gradient 
ascent in the dual variable q. 

6 Applications to Image Processing 


We present an extensive performance evaluation of different CTV based methods on several inverse 
problems in color imaging such as denoising, deblurring, and inpainting. In these cases, one typically 
introduces a positive weighting constant A > 0 that controls the trade-off between G, which forces 
the solution of the optimization problem to be close to some given data, and the regularization term: 


min^G(u) -f \\Ku\ 

uGA Z 


b,a' 


For the sake of consistency among comparisons, we solved each problem with a range of different 
values of A and only reported the best result for each regularization and each degradation condition 
in terms of the highest peak signal-to-noise ratio (PSNR). Furthermore, we chose the linear operator 
K to be the discrete local gradient computed via forward differences. In all tests, we used images 
from the Kodak collection (http://rOk.us/graphics/kodak/), and all results were saved in integer 
values relative to the intensity range [0, 255]. 


In view of the optimality conditions of (31), one defines the following sequences of primal and 
dual residuals: 

Pn+i ■■= - (u” - u"+i) - (q" - q"+i) , 


iA„+i := — (q" - 


q"+i) _ if (u" - u"+i) 


As stopping criterion we used a tolerance value of 10“® for the average of the above residuals per 
pixel. In any case, we stopped the algorithm after 1000 iterations even if the tolerance was not 
reached. 


6.1 Image Denoising: CTV 


Model 


We propose to extend the widely mentioned ROF model to color images by using CTV regularization. 
The primal problem is therefore given by 


min^|lu-f||^ -f llATull 

uGA Z 


b,a' 


(32) 


The C norm is the most suitable choice for suppressing Gaussian noise, since the energy (32) 


corresponds to the maximum a posteriori estimate. The proximity operator of the fidelity term 
G(u) := |||u-f||i is 


u = prox,.( 7 (u) u = 


u 


rAf 


1 + tA 
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To determine the general behaviour of several CTV regularizations with respect to changing the 
balancing parameter, Figure]^ shows the plots of the PSNR each method achieved for certain values 
of A. For these tests, we artificially added zero-mean Gaussian noise of standard deviation 25 to a 
noise-free color image. One observes that the peaks of the PSNR curves of the regularizations using 
£ 00 , 1 , 1 ^ (^i,£i), £ 2 , 1 . 1 ^ £ 00 , 2.1 norms achieve the highest values. Interestingly, although 

shows one of the lowest performances in terms of the maximal PSNR, its corresponding curve seems 
to drop slower as one overestimates A. As it is well known, the optimal value of A does not always 
lead to a complete noise removal. However, a huge reduction of the balancing parameter provides 
an over-smoothed result and, thus, significant information is lost. In the end, the optimal value 
in terms of the PSNR is obtained as a compromise between removing noise and preserving signal 
content. 



Regularization parameter 


Figure 4: Comparison of CTV methods using different values of A. The peaks of the curves of 
£00,1,1^ (^i,£i), £2.1.1 £00,2,1 £]gg highest values. Although £^4.i shows one of the lowest 

performances in terms of the maximal PSNR, its curve drops slower as one overestimates A. 


As an example of our experiments on CTV — denoising, we artificially added Gaussian noise 
with standard deviation 30 to the twenty-third Kodak image and computed the PSNR value for 
each reconstruction by comparing to the noise-free image. Picking the optimal value of A in terms 
of the PSNR for each method, we obtained the results shown in Figure We clearly observe that 
the CTV regularization based on the t!°°d.i ]gQj.]g^ provides the best PSNR value, and its denoised 
image is superior to the others in visual quality. Indeed, see the strong color artifacts on the parrot’s 
cheek for all results except for the norm. Although the norm shows nice denoising 

properties, a derivative matrix which has two derivative vectors being equal to zero also has rank one 
such that colored edges are not actively suppressed. The large inter-channel correlation of images in 
the Kodak dataset explains why the norm, which encourages jumps that occur in all channels 

in the sense given in Section]^ performs visually the best. On the other hand, shows one of the 
worst performances since it neither couples the colors nor the derivatives. Furthermore, 
does not work very well. It seems that imposing jumps of different color channels to point into the 
same direction can more effectively be enforced by the convex relaxation (S”^, than having a single 
direction in the dual variable as in the (^S°°,£^) approach. Finally, the isotropic jg beaten by 
the anisotropic and the new-proposed £'^’°°’^{der,col,pix) outperforms £^'‘^'^{col,der,pix) in 
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terms of both PSNR and visual quality assessment. 

A more detailed comparison analysis on color image denoising by the CTV—model, supporting 
software and an online demo will be made available soon. 


6.2 Image Denoising: CTV 


Model 


If we replace the P' norm in the data-penalty term of (32) by the more robust C norm, the CTV — C 
model arises: 


min A||u — f||i + ||Aru| 


uex 


6,a' 


(33) 


Some well-known advantages of (33) over the classical ROF model are contrast invariance and more 
effectiveness in removing noise containing strong outliers such as the salt-and-pepper type noise. In 
this case, the proximity operator of the fidelity term G(u) := A||u - fill is 


{ iiqfc "^A if Ui^k fi^k ^ 

Ui,k + tX if Ui^k - fi,k < -tX, 

fi,k if |Mi,/c -/i,fc| < tA. 

Note that the CTV — C model poses a nonsmooth optimization problem, which is also treatable by 
the PDHG algorithm. 

Given the probability a G [0,1] that a pixel is corrupted, we introduced salt-and-pepper noise 
by setting a fraction of ^ randomly selected pixels to black, and another fraction of ^ randomly 
selected pixels to white. We display in Figure]^ the optimal result each method provided on parts 
of the fifth Kodak image for a = 0.15. At first glance, the regularization using the newly-proposed 
£ 00 , 1,1 JJQJ.JJ 2 is the most successful in suppressing color spots. The numerical results confirm the 
previous visual inspection, since the PSNR value associated to the denoised image given by 
is clearly superior to all others. In fact, this is the unique method that actively suppresses the 
input noise and preserves sharp edges. For instance, observe that the edges separating saturated 
regions, such as the contours of the green and yellow front mudguards, are specially damaged with 
all regularizations except Finally, it is worth stressing that £ 2 ,oo,i clearly outperforms 


6.3 Image Deblurring 

The extension of the variational ROF model for image deblurring involves the minimization of the 
primal energy 

min^||Au-f||^ -H ll^u||j^, 

utA Z ’ 

where A is a linear operator modeling the degradation of u caused by blur and possibly noise. For 
the following experiments, we focus on image deconvolution, which refers to the case where the blur 
to be removed is linear and shift-invariant so that it may be expressed as a convolution of the image 
with a point spread function. Accordingly, the linear operator is given by Au = Lp * \i, where (£> is a 
Gaussian convolution kernel. 

The proximal mapping of the fidelity term G(u) := ^|jAu — fjll is given by 

u = prox^Q(u) <^ (/ -1- rAA*A)u = u -|- rAA*f. (34) 

Note that the above formula requires to compute (/ -I- rAA*A)“^, which is huge time consuming 
in the spatial domain for large values of the standard deviation of the kernel. This drawback is 
solved by working in the Fourier domain where the convolution becomes a mere multiplication. 
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der, pix) 
PSNR = 30.14 


Clean Noisy 



der,pix) 
PSNR = 31.00 


der, pix) 
PSNR = 30.92 







l'°“’i’i(coZ, der, pix) 
PSNR = 31.13 



£2,00,1 ^der, col, pix) 
PSNR = 30.97 


(coZ, der, pix) 
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Figure 5: Close-ups of the ground truth, the input noisy data (additive Gaussian noise of s.d. 30), 
and the denoised images obtained from the minimization of (32) on the twenty-third Kodak image. 
For each method, the value of A which gave the best PSNR value was determined experimentally. 
The PSNR value for each result is noted below the image. We observe that strong color artifacts 
remain on the parrot’s cheek in all results except for the norm. In fact, this method gives 

rise to significantly better visual quality and provides the best PSNR value. Although most 

closely approaches jn terms of the numerical assessment, it is still far from suppressing spots 

and color artifacts as does. 
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Figure 6: Close-ups of the ground truth, the input noisy data (15% of pixels with salt-and-pepper 
noise), and the denoised images obtained from the minimization of (33) on the fifth Kodak image. 
For each method, the value of A which gave the best PSNR value was determined experimentally. 
The PSNR value for each result is noted below the image. We observe that the £“4.1 norm is 
clearly superior in terms of the error as well from a visual inspection. Indeed, it is the most effective 
regularization to remove spots and preserve colors. Note also that £“4.i ^^d (to a lesser extend) 
£ 2 , 00,1 produce denoised images with sharp contours, which does not happen in all other cases since 
edges separating colored regions are damaged. 
















COLLABORATIVE TOTAL VARIATION 


23 


Hence, using the convolution theorem of Fourier transforms, the solution of (34) can be efficiently 
computed as 

fEiu) + TXEiA)*F{iy 


u = J- 


1 + TXE(Ay 


(35) 


where J- and denote the Fast Fourier Transform (FFT) and the inverse FFT, respectively. Note 
that all operations in the above formula are componentwise. 

We tested all CTV regularizations on the third Kodak image. The degraded data was simulated 
by convolving the ground truth with a Gaussian kernel of standard deviation 2 and further adding 
white Gaussian noise of standard deviation 0.5. The quality of the restored images with optimal 
values of A can be evaluated both visually and numerically in Figure We observe that the blur 
has been almost suppressed in all cases even though some geometry and texture cannot be recovered 
from the corrupted data. As expected from any TV based model, the restored images tend to be 
piecewise smooth. In general terms, it seems that isotropic regularization is more suitable for image 
deblurring - at least with very little noise - than anisotropic filtering. Indeed, and the new- 

proposed provide the best PSNR values together with the nuclear norm On the other 

hand, one realizes that £“4,1 g^rid (S'^,£^) are superior in removing color artifacts at the text on the 
cap. In the end, the nuclear norm compromises between removing blur and avoiding color spots. 


6.4 Image Inpainting 


Image inpainting is the process of filling-in lost data in a known region of an image. Although during 
the last years a lot of effort has been put into the development of powerful image priors, we are 
interested in the TV based image inpainting model [10] , which is limited to inpainting the geometric 
structure at unknown pixels. 

Let I C be the inpainting domain, that is, the set of all pixels in the image where the 
intensity value of all color channels is unknown. Therefore, the primal problem we focus on is given 
by 


mm II u f|liBiv\7" 

uGX 2 ® 


II^U|U 


(36) 


where || • denotes the Euclidean norm at known pixels. We see that the proximity operator of 

G(u) = I|ju- is 


u = prox.,.o.(u) 


Ui^k if z e X, 

Ui k + TXfi k , 

-r- otherwise. 

1 + tX 


For the comparative quality assessment in image inpainting, we used a mask with random scrib¬ 
bles. In Figure]^ we show the optimal result in terms of the highest PSNR provided by each CTV 
regularization on parts of the twentieth Kodak image. Since the image domain which is to be filled in 
is thin, pretty good numerical results are in general obtained. Indeed, all methods exhibit excellent 
PSNR since an increase of about 20 dB is reached (the value of the input data is 20.58). Concerning 
jjorms, one realizes that isotropic regularization performs significantly better than anisotropic 
filtering. In this setting, observe that £ac,,i,i pj-ovide the lowest PSNR values as well 

the worst inpainted images from visual quality assessment. On the other hand, CTV methods based 
on £^4.1^ and £^’“4 norms are significantly superior to all other regularizations 

both visually - compare the results at the edge separating the two gray regions with different color 
scheme - and in terms of the metric. Accordingly, TV-based inpainting prefers straight contours as 
they have minimal total variation, but it is less successful for recovering curved boundaries. In this 
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Figure 7: Close-ups of the ground truth, the input blurred and noisy data (Gaussian convolution of 
s.d. 2 and further additive Gaussian noise of s.d. 0.5), and the restored images each method provided 
on the third Kodak image. For each CTV regularization, the value of A which gave the best PSNR 
value was determined experimentally. The PSNR value for each result is noted below the image. We 
observe that the blur has been almost suppressed in all cases although some spatial details cannot 
be recovered from the corrupted data. As expected from TV based models, the restored images tend 
to be piecewise smooth. Note also that the visual and numerical differences are not as great as in 
the denoising case. However, it seems that isotropic diffusion is more suitable for deblurring images 
with very little noise since and the new-proposed provide the best results together with 

the nuclear norm 
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Figure 8: Close-ups of the ground truth, the input masked image with random scribble, and the 
inpainted images provided by the minimization of (32) on the twentieth Kodak image. For each 
method, the value of A which gave the best PSNR value was determined experimentally. The PSNR 
value for each result is noted below the image. In this case, we realize that anisotropic diffusion 
severely compromises the perfomance of £P’1’'^ norms. Indeed, £2.i,i £oo,i,i pj-gvide the 

lowest PSNR values as well the worst inpainted images from visual quality assessment. On the 
contrary, ^ 2 . 2 , 1 ^ (S'°°,£^), (S'^,£^) and £ 2 , 00,1 similar results and are able to better recover sharp 
edges damaged by the mask. 
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setting, one sees that all methods perfectly recover the color edge in the propeller of the plane, but 
they fail at its yellow boundary. 

CIE-L*a*b* space. All previous experiments were performed using the standard RGB color 
space. The CIE-L*a*b* is a perceptually uniform color space, a property which the common RGB 
model does not have, describing all the colors visible to the human eye. The three coordinates L*, 
a* and b* represent the lightness of the color, its position between red/magenta and green, and its 
position between yellow and blue, respectively. Contrary to RGB color systems, in this space the 
color differences which one perceives correspond to distances when measuring colorimetrically. 

Figure shows the results of minimizing (361 on both RGB and CIE-L*a*b* color spaces by 
means of CTV — regularization, which benefits from the superiority of isotropic diffusion as 

demonstrated in Figurej^ We observe that the choice of the uniform color space leads to a slight but 
visually noticeable improvement. Indeed, the bleeding of red across edges in RGB space vanishes 
when transforming the image into CIE-L*a*b* before inpainting. Accordingly, the PSNR gain is not 
negligible. 



Clean 


£2,00,1 £2,00,1 ciELab space 

PSNR = 38.95 PSNR = 39.18 


Figure 9: Close-ups of the ground truth and the inpainted images provided by the minimization of 
(36) on the twentieth Kodak image. In each case, the value of A which gave the best PSNR value 
was determined experimentally. First, we note that the PSNR gain in the CIE-L*a*b* space is not 
at all negligible. But, more importantly, the visual quality assessment demonstrates that a bleeding 
of red across edges appears in the RGB space. This effect vanishes when the inpainting is carried 
out in the perceptually uniform CIE-L*a*b* color space. 


7 Conclusions 

Considering the discrete setting, we have proposed to view the gradient of a multispectral image as 
a three dimensional matrix or tensor with the dimensions corresponding to the spatial extend, the 
directional derivatives considered as linear operators containing the differences to other pixels, and 
the color channels. We have then introduced collaborative total variation as the regularization that 
arises from taking different norms along each dimension. In particular, we have proposed to use col¬ 
laborative norms such as and (S'^, O), leading to very different properties of the regularization. 
We have provided relevant mathematical characterizations of the dual norm, the subdifferential and 
the proximal mapping of the proposed penalizations, which play a direct role in computing opti¬ 
mality conditions of several regularized problems. We have further proved, using the generalized 
concept of singular vectors, than an £°° coupling leads to the strongest channel correlation, makes 
the most prior assumptions, and has the greatest potential to reduce color artifacts. 

In experiments, we have demonstrated the wide applicability of the collaborative total variation to 
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general inverse problems like denoising, deblurring and inpainting. For the numerical computation 
of the solution we have used the primal-dual hybrid gradient algorithm and stated all proximity 
operators of the considered CTV regularizations. From the above standards, we have exhibited 
the superiority of the norm for a stronger suppression of color artifacts, and of the isotropic 

regularizations for filling in thin regions. 
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A Singular Vector Analysis 

We give here the mathematical details regarding the construction of singular vectors as discussed 
in Section]^ Let us remark that the following analysis could be done in a continuous setting with 
weak derivatives and distributions as long as there is a finite number of points at which the linearity 
of the functions I changes. Since our whole discussion about collaborative norms has been dealing 
with the discrete case, we limit the proofs in this section to the discrete setup, too. 

Let I : {a:i,..., xat} —>■ [—1,1] be a discretization of a piecewise linear function such that the 
piecewise linearity only changes at { — 1,1}. More precisely, define the slope as well as the discrete 
derivative operator at each point as Si := Dxl{xi) = l{xi) — l{xi-i). Let Dff denote the adjoint 
operator of Dj, defined by analogy with the continuous setting: {D^l,f) = {l,Df). One checks 
easily that D^l{xi) = l{xi) — l{xi+i). Then, we require either = Si, which means that we 
are in the piecewise linear part, or |Z(a:i)| = 1, which means that we are at a point where the type 
of linearity changes. As a first step, let us state the following lemma which will be needed in all 
following proofs. 

Lemma 2. Using definitions and notations above, Si — > 0 implies l{xi) = 1, and Si — s*+i < 0 

implies I(xi) = —1. In particular, l{xi)DxDffl{xi) = |si —Si+i| and l{xi) = sign{si — Si+i) whenever 

Si Si+i. 

Proof. If Si — Si+i > 0, then |/(a;i)| = 1 due to definition of 1. Let us suppose that l{xi) = —1. From 
— 1 foi' J G {1, • ■ ■, -^1, it follows that l{xi-i) > —1 and, as a consequence, Si < 0. This 
means that = l(xi+i) — l{xi) < 0, that is, l{xi+i) < —1, which contradicts [/(xi+i)] < 1. We 
thus deduce that l{xi) = 1. The proof for the case si — < 0 can be done in a similar fashion. 

The additional statement is a simple consequence of the first part along with D^D'ffl^Xi) = 
Si — Si+i by definition of the operators. □ 

For each CTV regularization, the following results show that if zj. and z^ have some specific 
expressions, then the associated image u = is a singular vector of the energy J(u) = ||Zlu|jj^^. 
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Theorem 5. For each k G {1,..., C}, let and he discretizations of arbitrary piecewise linear 
functions in [—1,1], with the properties described previously. Let us consider z].{xi,yj) = c\ll.{xi) 
and zl{xi,yj) = c^lUyj) such that c\,c\ G {0, ±1}, as well as Uk{xi,yj) = zl){xi,yj) + 
iDyzl){xi,yj). Then, u e dJ{u) for J(u) = ||£)u||i_i^i. 

Proof. We aim at proving u G 9J(u) with J(u) = ||Zlu||i_i^i. Based on the characterization of the 
subdifferential of J given in ( [T^ , we have to show that ||z||oo^oo.oo < 1, which is obvious due to 
its construction, together with sign{DxUk{xi,yj)) = z}.{xi,yj) and sign {DyUk{xi,yj)) = z‘j,{xi,yj) 
for all (xi,yj) at which Duk{xi,yj) 0. First, we can assume that cl ^ 0 since, otherwise, the 
statement is trivially satisfied. Now, observe that 


DxUkixi, yj) — DxD,j, Zf,{xi, y^) + D^Dy Zi,{xi, yj) — Cf^D^D,,. l^ixi) + c^.D„ 


Dlll{y,) 


— Ck^xL)xlk{Xi) — Cf,{Si Si+i), 


where D^DylKyj) = 0 since DylKyj) does not depend on x. Hence, Duk{xi,yj) ^ 0 implies that 
Si Si+i and, thus, ll{xi) = sign(si — s^+i) by Lemmaj^ It follows that 

sign {DxUk{xi,yj)) = sign (cl(s, - s,+i)) = clll(xi) = zl{xi,yj), 

where in the second transition from last we have used cl = sign (c^) derived from cl G {—1, +!}• The 
proof of sign {DyUk{xi,yj)) = zl{xi,yj) is similar and yields the assertion {z,Du) = ||Zlu||i^i_i. □ 

Theorem 6. Let R and R be discretizations of arbitrary piecewise linear functions in [—1,1], with 
the properties described previously. For each k G {1,... ,C}, let us define zl{xi, yj) = clR{xi) and 
^kixi,yj) = c\P{yj) such that c^,c^ e K with \\c '^\\2 = \\c ^\\2 = 1, and Uk{xi,yj) = {D'^zl){x^,yj) + 
{Dyzl){xi,yj). Then, u e dJ{u) for J(u) = [[Zlulla.i,!. 

Proof. Similar to Theorem]^ we have to check that l|zjj 2 , 00,00 < Ij which follows easily from ljc’'jj 2 = 
1 and < 1, as well as {z,Du) = jjl?uj| 2 ,i,i. Using Lemma|^ we see that 


\ '^iDxUk{xi,yj))'^ = 
\ fc=i 


, J2iDxDf:zk{xi,yj))^ = 

\ fc=i 


c 


\ k=l 


= |Si - Si+i| • ||C II2 = |Sz - Sz+l|. 


On the other hand, it follows that 

C C 


^ , Zk{xi,yj)DxUk{xi,yj) — ^ 'X ^k) ^ {xi)DxDxl (xi) 
k=l k=l 

= jsi — Si+ij • jjc jj2 = [sj — Si+ij. 


Therefore, we have obtained v'J2kiD^^k{xi,yj))^ = J2k 4(2^*: yj)L>xUkixi, yj) and, similarly, 

J2k ^k(^i^yj)^v'k'‘k{xi,yj). These equalities prove that {z,Du) = j|Zlujj 2 ,i,i, which yields the re¬ 
sult. □ 


Theorem 7. Let R and R be discretizations of arbitrary piecewise linear functions in [—1,1], with 
the properties described previously. For each k G {1,..., C}, consider cl, cf, G {0, ±1} and define 


zl{x^,yJ) 


^^R{xi) ifWc^Woj^O, 

0 otherwise. 
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and 




llc^llo 

0 otherwise. 


Let us also define Uk{xi,yj) = {D'^zl){xi,yj) + {D^ zl){xi,yj). Then, u G dJ(u) for J(u) = 

ll^u||oo,14- 


Proof. Once more, we need to show that ||2;||i_oo.oo < 1, which follows from |r(a;)| < 1 and J2k l^fel = 
||c’’|jo due to S {0, ±1}, and (z, Du) = ||Du||oo4^i. The latter is achieved if max^ \DxUk{xi, yj)\ = 
Zkix,y)DxUk{xi,yj). In the nontrivial case, ||c^||o 0, we obtain 


max \DxUkixi,yj)\ = max ID^D'^zl{xi,yj)\ = max 
l<k<C l<k<c' ' l<k<C 



DxD^l^(xj) 


= max 
l<k<C 





as well as 


J2^k(x,y)OxUk(xi,yj) = Tj^l^(xi)DxL)^l\xi) 
k=^ k=^ 11^ I'o 


— 5i+i| 


2 

0 


E (4)’ 


I Si — Si+i| 

llcMIg 


10 = 


|si — Sj+il 

llc^llo 


where we used = ||c^ ||o because of S {0,±1}. Similarly, one shows that max^ |Dj,u/j(a;i, ?/j)| 

zl{x, y)DyUk{xi, yj), which ends the proof. □ 


B Proof of Theorem [3] 


We first prove (24). Let ^ = J2j be, with q 
q G dg{f{xo)) one has that 


(<71, ■ • ■ 9m) e dg{f{xo)) and Vj G dfj{xo). From 


9 {f{x)) > g{f{xo)) + {q, f{x) - f{xo)), Vx G K”, 


and each condition vj G dfj{xo) yields 


f](x) > fj{xo) + {vj,x- xo), Vx G M". 

By using the above inequalities, we finally obtain 

m 

(go f){x) > g{f{xo)) +'^qj{vj,x - Xo) = {g o f){xo) + {^,x- xq), Mx G K”. 
i=i 

Assume now that xq G int dom {gof) and fj is locally l.s.c for all j G {!,..., m}. Let d, d and d°° 
denote the regular, general and horizon subdifferentials [44], respectively. For q G K”^, we introduce 
the notation {qf)(x) := gjfjix). Since / and g are proper and convex functions, [Ml Proposition 

8.12] implies d{qf){xo) = d{qf){xo), dg(f{xo)) = dg{f(xo)), and d{g o f){xo) = d{g o /)(xo). 
Furthermore, the properties of fj allow applying the same proposition to see 


= {^ G M" : (C,y - x) < 0, Vy G dom/,} = {0}, Vx G K", 
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which is equivalent to / being strictly continuous by [HI Theorem 9.13]. Note that [H Proposition 
8.12] also yields 

d°°9{f{xo)) = {C e K"* : (C, 2/ - /(a^o)) < 0, Vy G domy} , 

from where one deduces d°°g{f{xo)) = {0} due to Xq G intdom (y o /). Putting it all together, [HI 
Theorem 10.49] applies and so 

d{g o f) (xo) C y |5(y/)(a:o) ■■ q&dg (/(a:o))| , 

(g ° f) (a;o) C y \^d{qf){xo) : q G d^g (/(xo))} ■ 

Finally, the previous two inclusions along with the equivalence of regular and general subdifferentials 
lead to the equality in (24). 



